Skip to content

Add --skip-duplicates flag to wp media import#241

Merged
swissspidy merged 18 commits intomainfrom
copilot/skip-duplicates-feature
May 5, 2026
Merged

Add --skip-duplicates flag to wp media import#241
swissspidy merged 18 commits intomainfrom
copilot/skip-duplicates-feature

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 19, 2026

  • Add --skip-duplicates flag to import() docblock and @param type hint
  • Add $skips = 0 variable in import()
  • Add duplicate detection for local files using _wp_attached_file basename match
  • Add duplicate detection for remote files using _wp_attached_file basename match (extracted from URL)
  • Use WP_Query with meta_query (OR relation) for duplicate detection instead of raw SQL
  • Use explode() instead of strtok() consistently for URL query-string stripping
  • Pass $skips to report_batch_operation_results only when --skip-duplicates is active
  • Add find_duplicate_attachment() private helper method
  • Fix duplicate detection for WP 5.3+ scaled images: derive basename-scaled.ext from the input basename and include it in the _wp_attached_file search
  • Fix --skip-duplicates duplicate check to use the --file_name-resolved name when --file_name is provided, so files previously imported with a custom name are correctly detected as duplicates
  • Add Behat test scenarios for --skip-duplicates (local file, remote file, mixed batch, --file_name-aware duplicate detection)
  • Add @require-wp-5.3 Behat scenario to explicitly exercise scaled-image duplicate detection
  • Fix CI failures: update behat-data URLs from wp-cli.org to wp-cli.github.io (matching commit Update behat-data URLs in tests #250)
Original prompt

This section details on the original issue you should resolve

<issue_title>Skip Duplicates</issue_title>
<issue_description>## Feature Request

Describe your use case and the problem you are facing

I have a directory that I regularly update with images, audio, video, and documents. I want to run wp media import on this directory but have it skip any files that have already been imported. I can't think of a way to do this today other than some complicated scripting on my side that would compare file names and date modified between the two systems.

Describe the solution you'd like

I would love a flag that I could run as part of the command that could automatically do this for me. I plan on setting up a batch job to continuously import from the directory, it would be great if I could just run wp media import --skip_existing or something along those lines.
</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

@github-actions github-actions Bot added command:media-import Related to 'media import' command help-wanted Extra attention is needed scope:distribution Related to distribution state:unconfirmed labels Mar 19, 2026
Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
Copilot AI changed the title [WIP] Add skip duplicates functionality Add --skip-duplicates flag to wp media import Mar 19, 2026
Copilot AI requested a review from swissspidy March 19, 2026 23:01
@swissspidy

This comment was marked as resolved.

…s for non-skip-duplicates paths

Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>

This comment was marked as resolved.

This comment was marked as resolved.

swissspidy and others added 2 commits March 20, 2026 09:09
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ile handling

Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
@swissspidy

This comment was marked as resolved.

…age_file

Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>

This comment was marked as resolved.

@swissspidy

This comment was marked as resolved.

This comment was marked as resolved.

Comment thread src/Media_Command.php Outdated
…instead of raw SQL

Agent-Logs-Url: https://github.com/wp-cli/media-command/sessions/0f126b51-1ee9-48c9-9031-92a31261b462

Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
Copilot AI requested a review from swissspidy May 4, 2026 09:19
@swissspidy swissspidy removed state:unconfirmed help-wanted Extra attention is needed labels May 4, 2026
@swissspidy swissspidy marked this pull request as ready for review May 4, 2026 13:08
@swissspidy swissspidy requested a review from a team as a code owner May 4, 2026 13:08
@swissspidy
Copy link
Copy Markdown
Member

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a --skip-duplicates flag to the wp media import command, allowing users to skip files already present in the media library. It includes new Behat tests and logic to detect duplicates based on the _wp_attached_file meta key. Feedback identifies a potential performance improvement by caching basename calculations for remote files. More importantly, the use of LIKE in the duplicate detection query may cause false positives, and the removal of a compatibility check for wp_get_additional_image_sizes() breaks support for WordPress versions older than 4.7.0.

Comment thread src/Media_Command.php Outdated
}
} else {
if ( Utils\get_flag_value( $assoc_args, 'skip-duplicates' ) ) {
$existing = $this->find_duplicate_attachment( (string) explode( '?', Path::basename( $file ), 2 )[0] );
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The expression (string) explode( '?', Path::basename( $file ), 2 )[0] is calculated twice for each remote file (here and again on line 653). For better performance in bulk import operations and improved maintainability, consider extracting this to a variable (e.g., $basename) before the duplicate check and reusing it.

Comment thread src/Media_Command.php
Comment on lines +1137 to +1141
'key' => '_wp_attached_file',
'value' => '/' . $variant,
'compare' => 'LIKE',
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using LIKE with a leading slash for duplicate detection can lead to false positives. In WordPress meta_query, a LIKE comparison wraps the value in wildcards, resulting in a query like meta_value LIKE '%/image.jpg%'. This will incorrectly match files like image.jpg.webp or image.jpg.bak when searching for image.jpg, causing the import to be skipped for non-duplicate files. To be more precise, you might consider using raw SQL with $wpdb to match the end of the string exactly (e.g., meta_value = %s OR meta_value LIKE %s with '%/' . $variant) or fetching the results and verifying the basename in PHP.

Comment thread src/Media_Command.php

This comment was marked as resolved.

@swissspidy

This comment was marked as resolved.

…quire-wp-5.3 Behat scenario

Agent-Logs-Url: https://github.com/wp-cli/media-command/sessions/43c9f132-60d4-48c3-93e9-1e252883fa46

Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>

This comment was marked as resolved.

@swissspidy swissspidy added this to the 2.2.8 milestone May 5, 2026
@swissspidy swissspidy merged commit 2e40550 into main May 5, 2026
62 checks passed
@swissspidy swissspidy deleted the copilot/skip-duplicates-feature branch May 5, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

command:media-import Related to 'media import' command scope:distribution Related to distribution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skip Duplicates

3 participants